Worst-case Analysis of Strategy Iteration and the Simplex Method

نویسنده

  • Thomas Dueholm Hansen
چکیده

In this dissertation we study strategy iteration (also known as policy iteration) algorithms for solving Markov decision processes (MDPs) and twoplayer turn-based stochastic games (2TBSGs). MDPs provide a mathematical model for sequential decision making under uncertainty. They are widely used to model stochastic optimization problems in various areas ranging from operations research, machine learning, artificial intelligence, economics and game theory. The class of two-player turn-based stochastic games is a natural generalization of Markov decision processes that is obtained by introducing an adversary. 2TBSGs form an intriguing class of games whose status in many ways resembles that of linear programming 40 years ago. They can be solved efficiently with strategy iteration algorithms, resembling the simplex method for linear programming, but no polynomial time algorithm is known. Linear programming is an exceedingly important problem with numerous applications. The simplex method was introduced by Dantzig in 1947, and has since then been studied extensively. It can be shown that MDPs can be formulated as linear programs, thus, giving rise to the connection. Strategy iteration and simplex type algorithms are local search algorithms that repeatedly improve a current candidate solution. We say that the strategy iteration algorithm repeatedly performs improving switches, and that the simplex method repeatedly performs improving pivots. The strategy iteration algorithm and the simplex method are, in fact, paradigms for solving 2TBSGs, MDPs, and linear programs. To obtain concrete algorithms we must specify an improvement rule (pivoting rule). In this dissertation we mainly focus on three improvement rules: • Howard’s improvement rule: Update the current solution by simultaneously performing all improving switches. • RandomEdge: Update the current solution by performing a single uniformly random improving switch.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using core beliefs for point-based value iteration

Recent research on point-based approximation algorithms for POMDPs demonstrated that good solutions to POMDP problems can be obtained without considering the entire belief simplex. For instance, the Point Based Value Iteration (PBVI) algorithm [Pineau et al., 2003] computes the value function only for a small set of belief states and iteratively adds more points to the set as needed. A key comp...

متن کامل

Central Path Curvature and Iteration-Complexity for Redundant Klee—Minty Cubes

We consider a family of linear optimization problems over the n-dimensional Klee—Minty cube and show that the central path may visit all of its vertices in the same order as simplex methods do. This is achieved by carefully adding an exponential number of redundant constraints that forces the central path to take at least 2 − 2 sharp turns. This fact suggests that any feasible path-following in...

متن کامل

Simulation of Singular Fourth- Order Partial Differential Equations Using the Fourier Transform Combined With Variational Iteration Method

In this paper, we present a comparative study between the modified variational iteration method (MVIM) and a hybrid of Fourier transform and variational iteration method (FTVIM). The study outlines the efficiencyand convergence of the two methods. The analysis is illustrated by investigating four singular partial differential equations with variable coefficients. The solution of singular partia...

متن کامل

Evaluation of Social Media Platforms Using Best-Worst Method and Fuzzy VIKOR Methods: A Case Study of Travel Agency

A correct social media strategy is essential for travel agencies working in today's global market to reach customers. The travel industry is a service-oriented industry, and travel agencies can easily reach their customers on social media by transforming their marketing strategies at no extra costs. There are so many options that a travel agency can use to make itself more visible on social med...

متن کامل

Nelder-Mead Simplex Optimization Routine for Large-Scale Problems: A Distributed Memory Implementation

The Nelder-Mead simplex method is an optimization routine that works well with irregular objective functions. For a function of n parameters, it compares the objective function at the n + 1 vertices of a simplex and updates the worst vertex through simplex search steps. However, a standard serial implementation can be prohibitively expensive for optimizations over a large number of parameters. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012